This iPython Notebook was created as a companion to the "Advanced Python for Data Analysis" tutorial and class taught at Investigative Reporters and Editors' Computer-Assisted Reporting conference in March, 2015 in Atlanta, Ga.

A first look at Pandas

Now that we're comfortable with our coding environment, and we've dabbled with the iPython notebook, it's time to dive into pandas, the Python library created explicitly for data analysis.

First in our interactive iPython notebook, we'll import pandas and call it "pd", a convention of pandas users that you'll quickly grow accustomed to.



In [1]:

    
import pandas as pd

Today, we're going to work with a fairly large text file of car accident data from the New Jersey Department of Transportation. This file contains reports from accidents in the Garden State between 2008 and 2013. The data, originally filed in handwritten reports by state troopers and then typed into a fixed-width database by clerks, is both decently large and very messy. (As an aside, if you're really interested in the genesis of this particular file, you can browse the makefile we used to create it.)

Our analysis today will not be comprehensive or particularly accurate, but working with 1.7 million rows of dirty data is a pretty good way to illustrate what you can do with pandas and how it's going to make your life easier.

The first thing we have to do is ingest our data from a source into a pandas DataFrame object. Think of a DataFrame (similar to R's Data Frame) as a virtual spreadsheet. It has columns with unique names and rows with unique row numbers, which we call "the Index". You can read in data from many kinds of sources: json, Excel files, html on the web, sql databases or others. Today, we're going to work with a CSV. First, we're going to create a variable with the name of our CSV, then we'll use pandas' .read_csv() function.



In [2]:

    
datafile = "njaccidents.csv"

df = pd.read_csv(datafile)









    



/Users/tommeagher/.virtualenvs/pandas/lib/python2.7/site-packages/pandas/io/parsers.py:1139: DtypeWarning: Columns (6,17) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

Now we have a DataFrame object with the name df. Let's use the .head() method to look at the first five rows of data. Does it look familliar?



In [3]:

    
df.head()









    Out[3]:






  
    
      
      case code
      County Name
      Municipality Name
      Crash Date
      Crash Day Of Week
      Crash Time
      Police Dept Code
      Police Department
      Police Station
      Total Killed
      ...
      Is Ramp
      Ramp To/From Route Name
      Ramp To/From Route Direction
      Posted Speed
      Posted Speed Cross Street
      Latitude
      Longitude
      Cell Phone In Use Flag
      Other Property Damage
      Reporting Badge No.
    
  
  
    
      0
       2008010108-026816              
       ATLANTIC    
       ABSECON CITY            
       03/04/2008
       TU
       1539
       01
       ATLANTIC CITY            
       AIU            
       0
      ...
        
                                
         
       50
         
       39.41158
       74.49162
       N
       NONE                                          ...
       384  
    
    
      1
       2008010108-163190              
       ATLANTIC    
       ABSECON CITY            
       12/19/2008
       F 
       1114
       01
       ATLANTIC CITY            
       TRAFFIC        
       0
      ...
        
                                
         
       50
         
       39.39231
       74.48952
       N
       NONE                                          ...
       739  
    
    
      2
       2008010108-24779               
       ATLANTIC    
       ABSECON CITY            
       11/25/2008
       TU
       0345
       99
       NJ TRANSIT P.D.          
       ATLANTIC CITY  
       0
      ...
        
                                
         
       10
       25
               
               
       N
       ?                                             ...
       0053 
    
    
      3
       2008010108-3901                
       ATLANTIC    
       ABSECON CITY            
       03/31/2008
       M 
       0105
       01
       EAST WINDSOR             
       TRAFFIC UNIT   
       0
      ...
        
                                
         
        0
         
               
               
       N
       NONE                                          ...
       551  
    
    
      4
       2008010108-5016                
       ATLANTIC    
       ABSECON CITY            
       01/25/2008
       F 
       0942
       01
       EGG HARBOR TWP           
       HQ             
       0
      ...
        
                                
         
       50
       40
       39.43036
       74.52469
       N
       NONE                                          ...
       1571 
    
  

5 rows × 47 columns

It's a table!

To see what we have here, let's make a list of the columns we have to work with the DataFrame's .columns attribute.



In [4]:

    
#this is a list that you can pluck items out of.
df.columns









    Out[4]:





Index([u'case code', u'County Name', u'Municipality Name', u'Crash Date', u'Crash Day Of Week', u'Crash Time', u'Police Dept Code', u'Police Department', u'Police Station', u'Total Killed', u'Total Injured', u'Pedestrians Killed', u'Pedestrians Injured', u'Severity', u'Intersection', u'Alcohol Involved', u'HazMat Involved', u'Crash Type Code', u'Total Vehicles Involved', u'Crash Location', u'Location Direction', u'Route', u'Route Suffix', u'SRI (Std Rte Identifier)', u'MilePost', u'Road System', u'Road Character', u'Road Surface Type', u'Surface Condition', u'Light Condition', u'Environmental Condition', u'Road Divided By', u'Temporary Traffic Control Zone', u'Distance To Cross Street', u'Unit Of Measurement', u'Directn From Cross Street', u'Cross Street Name', u'Is Ramp', u'Ramp To/From Route Name', u'Ramp To/From Route Direction', u'Posted Speed', u'Posted Speed Cross Street', u'Latitude', u'Longitude', u'Cell Phone In Use Flag', u'Other Property Damage', u'Reporting Badge No.'], dtype='object')

To get a quick overview of what kind of data is in each column, we can try the .describe() method.



In [5]:

    
df.describe()









    Out[5]:






  
    
      
      Total Killed
      Total Injured
      Pedestrians Killed
      Pedestrians Injured
      Total Vehicles Involved
      Road System
      Posted Speed
    
  
  
    
      count
       1707841.000000
       1707841.000000
       1707841.000000
       1707841.000000
       1707841.000000
       1707841.000000
       1707840.000000
    
    
      mean
             0.001968
             0.308403
             0.000489
             0.016251
             1.878721
             5.191971
            31.205212
    
    
      std
             0.047111
             0.698354
             0.022238
             0.130033
             0.538178
             2.499018
            17.872276
    
    
      min
             0.000000
             0.000000
             0.000000
             0.000000
             0.000000
             0.000000
             0.000000
    
    
      25%
             0.000000
             0.000000
             0.000000
             0.000000
             2.000000
             2.000000
            25.000000
    
    
      50%
             0.000000
             0.000000
             0.000000
             0.000000
             2.000000
             5.000000
            30.000000
    
    
      75%
             0.000000
             0.000000
             0.000000
             0.000000
             2.000000
             7.000000
            45.000000
    
    
      max
             5.000000
            42.000000
             2.000000
            10.000000
            20.000000
            10.000000
            99.000000

In earlier versions of pandas (pre-15), .describe() gives you the summary statistics for the numeric columns. In newer versions of pandas, using the .describe(include='all') keyword argument will summarize all the columns.

For categorical columns, like 'Severity', you'll get the number of unique values and the most frequent.



In [6]:

    
df['Severity'].describe()









    Out[6]:





count     1707841
unique          3
top             P
freq      1326626
dtype: object

Try it again for 'County Name'.



In [7]:

    
df['County Name'].describe()









    Out[7]:





count          1707841
unique              21
top       MIDDLESEX   
freq            176402
dtype: object

Say you want to select a single column. You can do this in one of two ways. If the column name is a string without spaces, you can use dot notation, like df.Severity, for instance. Otherwise, you use a similar syntax to what we're used to with dicts, using brackets, like this: df['County Name']. If you want to grab more than one column, give it a list of column names.

Now, let's take a look at cleaning messy data in columns. Why does this return an empty DataFrame?



In [8]:

    
df[df['County Name']=='Passaic']









    Out[8]:






  
    
      
      case code
      County Name
      Municipality Name
      Crash Date
      Crash Day Of Week
      Crash Time
      Police Dept Code
      Police Department
      Police Station
      Total Killed
      ...
      Is Ramp
      Ramp To/From Route Name
      Ramp To/From Route Direction
      Posted Speed
      Posted Speed Cross Street
      Latitude
      Longitude
      Cell Phone In Use Flag
      Other Property Damage
      Reporting Badge No.
    
  
  
  

0 rows × 47 columns

With some digging, by using something like df[df['County Name'].str.contains('PASSAIC')], you can get a DataFrame of rows that have PASSAIC in the 'County Name' column.



In [9]:

    
df[df['County Name'].str.contains('PASSAIC')].head()









    Out[9]:






  
    
      
      case code
      County Name
      Municipality Name
      Crash Date
      Crash Day Of Week
      Crash Time
      Police Dept Code
      Police Department
      Police Station
      Total Killed
      ...
      Is Ramp
      Ramp To/From Route Name
      Ramp To/From Route Direction
      Posted Speed
      Posted Speed Cross Street
      Latitude
      Longitude
      Cell Phone In Use Flag
      Other Property Damage
      Reporting Badge No.
    
  
  
    
      242727
       2008160108-0002                
       PASSAIC     
       BLOOMINGDALE BORO       
       01/01/2008
       TU
       0052
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       35
               
               
       N
       SOUTHBOUND GUARDRAIL                          ...
       5    
    
    
      242728
       2008160108-0083                
       PASSAIC     
       BLOOMINGDALE BORO       
       01/09/2008
       W 
       0550
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
        0
         
               
               
       N
       NONE                                          ...
       11   
    
    
      242729
       2008160108-0155                
       PASSAIC     
       BLOOMINGDALE BORO       
       01/17/2008
       TH
       0127
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       40
       25
       41.01790
       74.31252
       N
       UTILITY POLE # BMP560B MAILBOX FENCE CEMENT RE...
       11   
    
    
      242730
       2008160108-0159                
       PASSAIC     
       BLOOMINGDALE BORO       
       01/17/2008
       TH
       0909
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.01119
       74.32027
       N
       NONE                                          ...
       10   
    
    
      242731
       2008160108-0166                
       PASSAIC     
       BLOOMINGDALE BORO       
       01/17/2008
       TH
       1814
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.03530
       74.34463
       N
       NONE                                          ...
       11   
    
  

5 rows × 47 columns

When we select a single row with the .ix[] index function, you'll notice that the 'County Name' column includes a bunch of trailing whitespace: 'PASSAIC '. Normally, in Python, you might solve this by writing a for loop to cycle through every item in the column and clean it up one at a time.



In [10]:

    
df['County Name'][df['County Name'].str.contains('PASSAIC')].ix[242727]









    Out[10]:





'PASSAIC     '

But with Pandas, we can do the same thing much faster. We'll use the .map() function to perform the .strip() method on every string in the column at the same time.



In [11]:

    
df['County Name']=df['County Name'].map(str.strip)

Once our 'County Name' field is cleaned, we can filter the table by its values, returning a view of the DataFrame that only shows rows with accidents that happened in Passaic County.

You can try something similar by finding Police Departments that contain the string BLOOMINGDALE and cleaning them up with the strip function, if you want to.



In [12]:

    
df['Police Department'][df['Police Department'].str.contains('BLOOMINGDALE')]









    Out[12]:





242727    BLOOMINGDALE             
242728    BLOOMINGDALE             
242729    BLOOMINGDALE             
242730    BLOOMINGDALE             
242731    BLOOMINGDALE             
242732    BLOOMINGDALE             
242733    BLOOMINGDALE             
242734    BLOOMINGDALE             
242735    BLOOMINGDALE             
242736    BLOOMINGDALE             
242737    BLOOMINGDALE             
242738    BLOOMINGDALE             
242739    BLOOMINGDALE             
242740    BLOOMINGDALE             
242741    BLOOMINGDALE             
...
1650731    BLOOMINGDALE BORO PD     
1650732    BLOOMINGDALE BORO PD     
1650733    BLOOMINGDALE BORO PD     
1650734    BLOOMINGDALE BORO PD     
1650735    BLOOMINGDALE BORO PD     
1650736    BLOOMINGDALE BORO PD     
1650737    BLOOMINGDALE BORO PD     
1650738    BLOOMINGDALE BORO PD     
1650739    BLOOMINGDALE BORO PD     
1650740    BLOOMINGDALE BORO PD     
1650741    BLOOMINGDALE BORO PD     
1650742    BLOOMINGDALE BORO PD     
1650743    BLOOMINGDALE BORO PD     
1650744    BLOOMINGDALE BORO PD     
1665827    BLOOMINGDALE BORO PD     
Name: Police Department, Length: 844, dtype: object

Once our 'County Name' field is cleaned, we can filter the table by its values, returning a view of the DataFrame that only shows rows with accidents that happened in Passaic County, with a comparison using ==.



In [13]:

    
df[df['County Name']=='PASSAIC']









    Out[13]:






  
    
      
      case code
      County Name
      Municipality Name
      Crash Date
      Crash Day Of Week
      Crash Time
      Police Dept Code
      Police Department
      Police Station
      Total Killed
      ...
      Is Ramp
      Ramp To/From Route Name
      Ramp To/From Route Direction
      Posted Speed
      Posted Speed Cross Street
      Latitude
      Longitude
      Cell Phone In Use Flag
      Other Property Damage
      Reporting Badge No.
    
  
  
    
      242727 
       2008160108-0002                
       PASSAIC
       BLOOMINGDALE BORO       
       01/01/2008
       TU
       0052
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       35
               
               
       N
       SOUTHBOUND GUARDRAIL                          ...
       5    
    
    
      242728 
       2008160108-0083                
       PASSAIC
       BLOOMINGDALE BORO       
       01/09/2008
       W 
       0550
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
        0
         
               
               
       N
       NONE                                          ...
       11   
    
    
      242729 
       2008160108-0155                
       PASSAIC
       BLOOMINGDALE BORO       
       01/17/2008
       TH
       0127
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       40
       25
       41.01790
       74.31252
       N
       UTILITY POLE # BMP560B MAILBOX FENCE CEMENT RE...
       11   
    
    
      242730 
       2008160108-0159                
       PASSAIC
       BLOOMINGDALE BORO       
       01/17/2008
       TH
       0909
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.01119
       74.32027
       N
       NONE                                          ...
       10   
    
    
      242731 
       2008160108-0166                
       PASSAIC
       BLOOMINGDALE BORO       
       01/17/2008
       TH
       1814
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.03530
       74.34463
       N
       NONE                                          ...
       11   
    
    
      242732 
       2008160108-0194                
       PASSAIC
       BLOOMINGDALE BORO       
       01/21/2008
       M 
       1318
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
        0
         
               
               
       N
       NONE                                          ...
       11   
    
    
      242733 
       2008160108-0273                
       PASSAIC
       BLOOMINGDALE BORO       
       02/01/2008
       F 
       0905
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       35
               
               
       N
       NONE                                          ...
       12   
    
    
      242734 
       2008160108-0307                
       PASSAIC
       BLOOMINGDALE BORO       
       02/05/2008
       TU
       1459
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
        0
         
               
               
       N
       NONE                                          ...
       11   
    
    
      242735 
       2008160108-0310                
       PASSAIC
       BLOOMINGDALE BORO       
       02/05/2008
       TU
       1728
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.00775
       74.33502
       N
       NONE                                          ...
       8    
    
    
      242736 
       2008160108-0330                
       PASSAIC
       BLOOMINGDALE BORO       
       02/08/2008
       F 
       1309
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.00741
       74.34208
       N
       NONE                                          ...
       8    
    
    
      242737 
       2008160108-0360                
       PASSAIC
       BLOOMINGDALE BORO       
       02/11/2008
       M 
       1835
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.00690
       74.34344
       N
       ?                                             ...
       6    
    
    
      242738 
       2008160108-0365                
       PASSAIC
       BLOOMINGDALE BORO       
       02/12/2008
       TU
       1221
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       30
       35
               
               
       N
       NONE                                          ...
       12   
    
    
      242739 
       2008160108-0369                
       PASSAIC
       BLOOMINGDALE BORO       
       02/12/2008
       TU
       1823
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       15
       41.01493
       74.35526
       N
       ?                                             ...
       6    
    
    
      242740 
       2008160108-0370                
       PASSAIC
       BLOOMINGDALE BORO       
       02/12/2008
       TU
       2217
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       30
       41.00387
       74.33420
       N
       NONE                                          ...
            
    
    
      242741 
       2008160108-0381                
       PASSAIC
       BLOOMINGDALE BORO       
       02/14/2008
       TH
       0808
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.01119
       74.32027
       N
       NONE                                          ...
       11   
    
    
      242742 
       2008160108-0393                
       PASSAIC
       BLOOMINGDALE BORO       
       02/16/2008
       SA
       0748
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       25
       25
               
               
       N
       NONE                                          ...
       12   
    
    
      242743 
       2008160108-0448                
       PASSAIC
       BLOOMINGDALE BORO       
       02/21/2008
       TH
       1122
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
         
       41.00673
       74.35107
       N
       NONE                                          ...
       9    
    
    
      242744 
       2008160108-0457                
       PASSAIC
       BLOOMINGDALE BORO       
       02/22/2008
       F 
       1228
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       30
       41.00141
       74.32169
       N
       NONE                                          ...
       10   
    
    
      242745 
       2008160108-0476                
       PASSAIC
       BLOOMINGDALE BORO       
       02/25/2008
       M 
       1142
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.00690
       74.34344
       N
       ?                                             ...
       6    
    
    
      242746 
       2008160108-0491                
       PASSAIC
       BLOOMINGDALE BORO       
       02/27/2008
       W 
       0133
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       25
       25
               
               
       N
       NONE NOTED                                    ...
       5    
    
    
      242747 
       2008160108-0504                
       PASSAIC
       BLOOMINGDALE BORO       
       02/28/2008
       TH
       2020
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
        0
         
               
               
       N
       UNKNOWN                                       ...
       8    
    
    
      242748 
       2008160108-0538                
       PASSAIC
       BLOOMINGDALE BORO       
       03/05/2008
       W 
       0806
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       30
       35
       41.00384
       74.33426
       N
       NONE                                          ...
       9    
    
    
      242749 
       2008160108-0619                
       PASSAIC
       BLOOMINGDALE BORO       
       03/14/2008
       F 
       0708
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       25
       41.03262
       74.34254
       N
       KAMPFE LAKE ASSOCIATION PROPERTY-TREES/LANDSCA...
       6    
    
    
      242750 
       2008160108-0640                
       PASSAIC
       BLOOMINGDALE BORO       
       03/15/2008
       SA
       1537
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       25
         
               
               
       N
       NONE                                          ...
       9    
    
    
      242751 
       2008160108-0687                
       PASSAIC
       BLOOMINGDALE BORO       
       03/22/2008
       SA
       0431
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       25
       35
               
               
       N
       UTILITY POLE# BMP331P WAS STRUCK AND SUSTAINED...
       10   
    
    
      242752 
       2008160108-0719                
       PASSAIC
       BLOOMINGDALE BORO       
       03/26/2008
       W 
       1340
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       30
       25
       41.00247
       74.32864
       N
       NONE                                          ...
       8    
    
    
      242753 
       2008160108-0743                
       PASSAIC
       BLOOMINGDALE BORO       
       03/28/2008
       F 
       0700
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       35
               
               
       N
       DAMAGE TO LANDSCAPING OF 587 GLENWILD AVE.    ...
            
    
    
      242754 
       2008160108-0762                
       PASSAIC
       BLOOMINGDALE BORO       
       03/29/2008
       SA
           
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       25
       35
               
               
       N
       NONE                                          ...
            
    
    
      242755 
       2008160108-0797                
       PASSAIC
       BLOOMINGDALE BORO       
       04/02/2008
       W 
       0814
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       15
               
               
       N
       NONE                                          ...
            
    
    
      242756 
       2008160108-0798                
       PASSAIC
       BLOOMINGDALE BORO       
       04/02/2008
       W 
       0908
       01
       BLOOMINGDALE             
                      
       0
      ...
        
                                
         
       35
       35
               
               
       N
       NONE                                          ...
       12   
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      1669227
       20131616B060-2013-01720A       
       PASSAIC
       WOODLAND PARK BORO      
       07/11/2013
       TH
       2240
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       6732 
    
    
      1669228
       20131616B060-2013-01772A       
       PASSAIC
       WOODLAND PARK BORO      
       07/17/2013
       W 
       1827
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7190 
    
    
      1669229
       20131616B060-2013-01787A       
       PASSAIC
       WOODLAND PARK BORO      
       07/17/2013
       W 
       1107
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       6701 
    
    
      1669230
       20131616B060-2013-01848A       
       PASSAIC
       WOODLAND PARK BORO      
       07/24/2013
       W 
       1840
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       5349 
    
    
      1669231
       20131616B060-2013-01865A       
       PASSAIC
       WOODLAND PARK BORO      
       07/26/2013
       F 
       2014
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7106 
    
    
      1669232
       20131616B060-2013-01869A       
       PASSAIC
       WOODLAND PARK BORO      
       07/27/2013
       SA
       1641
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       6227 
    
    
      1669233
       20131616B060-2013-01922A       
       PASSAIC
       WOODLAND PARK BORO      
       08/01/2013
       TH
       0053
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       1
      ...
        
                                
         
       55
         
               
               
       N
       NONE.                                         ...
       7238 
    
    
      1669234
       20131616B060-2013-01923A       
       PASSAIC
       WOODLAND PARK BORO      
       08/01/2013
       TH
       1104
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
       F
       JACKSON AVE              
       EB
       65
         
               
               
       N
       NONE                                          ...
       6962 
    
    
      1669235
       20131616B060-2013-01937A       
       PASSAIC
       WOODLAND PARK BORO      
       08/02/2013
       F 
       1818
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
       NONE                                          ...
       6185 
    
    
      1669236
       20131616B060-2013-01963A       
       PASSAIC
       WOODLAND PARK BORO      
       08/05/2013
       M 
       2119
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
       NONE                                          ...
       7055 
    
    
      1669237
       20131616B060-2013-01964A       
       PASSAIC
       WOODLAND PARK BORO      
       08/05/2013
       M 
       2119
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       65
         
               
               
       N
       NONE                                          ...
       6962 
    
    
      1669238
       20131616B060-2013-01966A       
       PASSAIC
       WOODLAND PARK BORO      
       08/06/2013
       TU
       0005
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
       APPROXIMATELY 15 FEET OF GUARDRAIL  NJDOT  TRE...
       6084 
    
    
      1669239
       20131616B060-2013-02030A       
       PASSAIC
       WOODLAND PARK BORO      
       08/12/2013
       M 
       1733
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7071 
    
    
      1669240
       20131616B060-2013-02047A       
       PASSAIC
       WOODLAND PARK BORO      
       08/14/2013
       W 
       0618
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       6701 
    
    
      1669241
       20131616B060-2013-02131A       
       PASSAIC
       WOODLAND PARK BORO      
       08/25/2013
       S 
       2239
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7106 
    
    
      1669242
       20131616B060-2013-02146A       
       PASSAIC
       WOODLAND PARK BORO      
       08/27/2013
       TU
       1016
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7167 
    
    
      1669243
       20131616B060-2013-02184A       
       PASSAIC
       WOODLAND PARK BORO      
       09/01/2013
       S 
       1004
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7178 
    
    
      1669244
       20131616B060-2013-02223A       
       PASSAIC
       WOODLAND PARK BORO      
       09/06/2013
       F 
       0906
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       6701 
    
    
      1669245
       20131616B060-2013-02227A       
       PASSAIC
       WOODLAND PARK BORO      
       09/06/2013
       F 
       1308
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       6660 
    
    
      1669246
       20131616B060-2013-02283A       
       PASSAIC
       WOODLAND PARK BORO      
       09/12/2013
       TH
       2015
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7055 
    
    
      1669247
       20131616B060-2013-02297A       
       PASSAIC
       WOODLAND PARK BORO      
       09/13/2013
       F 
       1533
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
       T
       SQUIRRELWOOD ROAD        
       SB
       55
         
               
               
       N
                                                     ...
       6969 
    
    
      1669248
       20131616B060-2013-02306A       
       PASSAIC
       WOODLAND PARK BORO      
       09/15/2013
       S 
       1933
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7071 
    
    
      1669249
       20131616B060-2013-02385A       
       PASSAIC
       WOODLAND PARK BORO      
       09/22/2013
       S 
       0330
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       6701 
    
    
      1669250
       20131616B060-2013-02394A       
       PASSAIC
       WOODLAND PARK BORO      
       09/19/2013
       TH
       1903
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7167 
    
    
      1669251
       20131616B060-2013-02415A       
       PASSAIC
       WOODLAND PARK BORO      
       09/24/2013
       TU
       1608
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       65
         
               
               
       N
                                                     ...
       7319 
    
    
      1669252
       20131616B060-2013-02426A       
       PASSAIC
       WOODLAND PARK BORO      
       09/25/2013
       W 
       0720
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7190 
    
    
      1669253
       20131616B060-2013-02432A       
       PASSAIC
       WOODLAND PARK BORO      
       09/26/2013
       TH
       0712
        2
       NEW JERSEY STATE POLICE  
       TOTOWA-SUB     
       0
      ...
        
                                
         
       55
         
               
               
       N
                                                     ...
       7075 
    
    
      1669254
       20131616PCSD13-04189           
       PASSAIC
       WOODLAND PARK BORO      
       07/09/2013
       TU
       1155
        3
       PASSAIC CO SHERIFFS DEPT 
       SHERIFFS PATROL
       0
      ...
        
                                
         
       25
       25
               
               
       N
                                                     ...
            
    
    
      1669255
       20131616PCSD13-04479           
       PASSAIC
       WOODLAND PARK BORO      
       07/21/2013
       S 
       1037
        3
       PASSAIC CO SHERIFFS DEPT 
       COUNTY PATROL  
       0
      ...
        
                                
         
       10
       25
               
               
       N
                                                     ...
       115  
    
    
      1669256
       20131616PCSD13-05367           
       PASSAIC
       WOODLAND PARK BORO      
       08/27/2013
       TU
       1950
        3
       PASSAIC CO SHERIFFS DEPT 
       COUNTY PATROL  
       0
      ...
        
                                
         
       25
       25
               
               
       N
                                                     ...
       949  
    
  

113217 rows × 47 columns

If we're confident the 'County Name' column is as clean as it's going to be, we can turn to others. What do we see when we look at the unique values in the 'Police Dept Code' column with the .unique() method?



In [14]:

    
df['Police Dept Code'].unique()









    Out[14]:





array(['01', '99', '  ', '02', '03', '04', 1, 99, 2, 3, 4], dtype=object)

If we filter the column for the police departments that use '99' as their code, we'll see their names are not unique (of course).



In [15]:

    
df[['Police Dept Code', 'Police Department']][df['Police Dept Code']==99]









    Out[15]:






  
    
      
      Police Dept Code
      Police Department
    
  
  
    
      311564 
       99
       PALISADES INTER. PARKWAY 
    
    
      311565 
       99
       PALISADES INTER. PARKWAY 
    
    
      311566 
       99
       PALISADES INTER. PARKWAY 
    
    
      311567 
       99
       PALISADES INTER. PARKWAY 
    
    
      311568 
       99
       PALISADES INTER. PARKWAY 
    
    
      311569 
       99
       PALISADES INTER. PARKWAY 
    
    
      311570 
       99
       PALISADES INTER. PARKWAY 
    
    
      311571 
       99
       PALISADES INTER. PARKWAY 
    
    
      311572 
       99
       PALISADES INTER. PARKWAY 
    
    
      311573 
       99
       PALISADES INTER. PARKWAY 
    
    
      311574 
       99
       PALISADES INTER. PARKWAY 
    
    
      311575 
       99
       PALISADES INTER. PARKWAY 
    
    
      311709 
       99
       NEW JERSEY TRANSIT POLICE
    
    
      312227 
       99
       NEW JERSEY TRANSIT POLICE
    
    
      312415 
       99
       NEW JERSEY TRANSIT POLICE
    
    
      312641 
       99
       NEW JERSEY TRANSIT POLICE
    
    
      313347 
       99
       NEW JERSEY TRANSIT POLICE
    
    
      313348 
       99
       NEW JERSEY TRANSIT POLICE
    
    
      316175 
       99
       NEW JERSEY TRANSIT POLICE
    
    
      316534 
       99
       PALISADES INTER. PARKWAY 
    
    
      316535 
       99
       PALISADES INTER. PARKWAY 
    
    
      320064 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320067 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320068 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320102 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320103 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320104 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320105 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320106 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      320107 
       99
       WILLIAM PATERSON UNIVERSI
    
    
      ...
      ...
      ...
    
    
      1701415
       99
       KEAN UNIVERSITY PD       
    
    
      1701416
       99
       KEAN UNIVERSITY PD       
    
    
      1701417
       99
       KEAN UNIVERSITY PD       
    
    
      1701418
       99
       KEAN UNIVERSITY PD       
    
    
      1701419
       99
       KEAN UNIVERSITY PD       
    
    
      1701420
       99
       KEAN UNIVERSITY PD       
    
    
      1701421
       99
       KEAN UNIVERSITY PD       
    
    
      1701422
       99
       KEAN UNIVERSITY PD       
    
    
      1701423
       99
       KEAN UNIVERSITY PD       
    
    
      1701424
       99
       KEAN UNIVERSITY PD       
    
    
      1701425
       99
       KEAN UNIVERSITY PD       
    
    
      1701426
       99
       KEAN UNIVERSITY PD       
    
    
      1701427
       99
       KEAN UNIVERSITY PD       
    
    
      1701428
       99
       KEAN UNIVERSITY PD       
    
    
      1701429
       99
       KEAN UNIVERSITY PD       
    
    
      1701430
       99
       KEAN UNIVERSITY PD       
    
    
      1701431
       99
       KEAN UNIVERSITY PD       
    
    
      1701432
       99
       KEAN UNIVERSITY PD       
    
    
      1701433
       99
       KEAN UNIVERSITY PD       
    
    
      1701434
       99
       KEAN UNIVERSITY PD       
    
    
      1701435
       99
       KEAN UNIVERSITY PD       
    
    
      1701436
       99
       KEAN UNIVERSITY PD       
    
    
      1701437
       99
       KEAN UNIVERSITY PD       
    
    
      1701438
       99
       KEAN UNIVERSITY PD       
    
    
      1701439
       99
       KEAN UNIVERSITY PD       
    
    
      1701440
       99
       KEAN UNIVERSITY PD       
    
    
      1701441
       99
       KEAN UNIVERSITY PD       
    
    
      1701442
       99
       KEAN UNIVERSITY PD       
    
    
      1701443
       99
       KEAN UNIVERSITY PD       
    
    
      1701444
       99
       KEAN UNIVERSITY PD       
    
  

4818 rows × 2 columns

You also might notice that some of our department codes start with 0, so we have data that is mixed between strings and integers in the same column. We can use pandas' .astype() method to change all the type of all of the values in that column to strings.



In [16]:

    
df['Police Dept Code']=df['Police Dept Code'].astype(str)
df['Police Dept Code'].unique()









    Out[16]:





array(['01', '99', '  ', '02', '03', '04', '1', '2', '3', '4'], dtype=object)

You should note two things here. To actually change the value of the column in the DataFrame in place, we'll have to assign it back to itself as if we're defining a new variable. Also, if you check the unique values again, we'll see that there are now values for '01' and '1'. You may want to standardize those, but we'll leave that to you as an exercise later.

Even after all that, we still have some empty fields with two spaces in the string, so let's replace those empty values with the word "Unknown"



In [17]:

    
df['Police Dept Code'][df['Police Dept Code']=='  ']='Unknown'

We've done a bit of cleaning here and may want to start doing some exploratory analysis. To do that, we'll create a smaller DataFrame of just the columns that we plan to examine, and we'll name it myframe.



In [18]:

    
#Create a smaller frame
myframe = df[['County Name', 'Municipality Name', 'Crash Date', 'Crash Day Of Week', 'Crash Time', 'Total Killed', 'Total Injured', 'Pedestrians Killed', 'Pedestrians Injured', 'Total Vehicles Involved', 'Crash Type Code', 'Alcohol Involved', 'Environmental Condition', 'Light Condition', 'Cell Phone In Use Flag']]

We'll pass myframe along to the next exercise for aggregating, but we might want to save it for later. While we're thinking about it, let's write it out to a CSV with the shockingly named .to_csv() method.



In [19]:

    
myframe.to_csv('smallertab.csv')

Digging Deeper with Pandas

We've done a bit of cleaning to our outrageously messy data, and now we want to start to explore what's actually in this data and look for more spots we have to clean.

If you were reporting on car accidents in your state, what would you want to know? Some of the questions you'd ask might include:

How many deaths happened in each county?
How many accidents have happened over time?
How many people are involved in car accidents?
What counties might be outliers in terms of deaths and injuries?

We'll start by reading our data into our DataFrame. Then we'll filter it and do a few group-bys, which should be familiar if you've ever worked with SQL. Finally, we'll create smaller DataFrames of this aggregated data to visualize.

One of the cleaning tasks we'll have to deal with is the format of the 'Crash Date' field, so we might as well import Python's standard datetime module and read our csv now.



In [20]:

    
from datetime import datetime

newframe = pd.read_csv('smallertab.csv')









    



/Users/tommeagher/.virtualenvs/pandas/lib/python2.7/site-packages/pandas/io/parsers.py:1139: DtypeWarning: Columns (11) have mixed types. Specify dtype option on import or set low_memory=False.
  data = self._reader.read(nrows)

This should all be pretty familiar. Notice that weird warning about mixed types? Although we cleaned the types in some of the columns, others are still mixed. You can solve this by using the dtypes keyword in the read_csv() function. It's worth looking at the docs to see all the options you can set when you read a CSV.

Now, we want to see our data aggregated by counties, so here's where pandas' .groupby() function comes in.



In [21]:

    
newframe.groupby('County Name')









    Out[21]:





<pandas.core.groupby.DataFrameGroupBy object at 0x1034cf3d0>

Now, when we run .groupby(), it returns a DataFrameGroupBy object, which is really only good to us if we do something else with it.

So try this:



In [22]:

    
#What county had the most accident deaths?
newframe.groupby('County Name').sum()









    Out[22]:






  
    
      
      Unnamed: 0
      Total Killed
      Total Injured
      Pedestrians Killed
      Pedestrians Injured
      Total Vehicles Involved
    
    
      County Name
      
      
      
      
      
      
    
  
  
    
      ATLANTIC
        40555622700
       187
       20957
       33
       1104
        98713
    
    
      BERGEN
       136762281964
       166
       47575
       70
       2879
       333025
    
    
      BURLINGTON
        56830343320
       249
       23727
       44
        634
       128223
    
    
      CAMDEN
        70636041684
       230
       36056
       62
       1539
       178112
    
    
      CAPE MAY
        15954173080
        48
        6114
        7
        258
        35792
    
    
      CUMBERLAND
        21914587233
       143
       11086
       21
        346
        49772
    
    
      ESSEX
       128406345105
       247
       58229
       95
       5224
       315666
    
    
      GLOUCESTER
        35575203872
       156
       15964
       30
        475
        81242
    
    
      HUDSON
        98932551977
       122
       29470
       44
       4026
       230722
    
    
      HUNTERDON
        19987461551
        55
        5835
       10
         95
        40153
    
    
      MERCER
        70130920959
       141
       22701
       39
        995
       153293
    
    
      MIDDLESEX
       154741088588
       308
       52131
       88
       2055
       336919
    
    
      MONMOUTH
       111922290674
       230
       36027
       52
       1239
       231862
    
    
      MORRIS
        64443161991
       128
       21039
       26
        658
       150796
    
    
      OCEAN
        82316419082
       284
       30520
       65
       1048
       176583
    
    
      PASSAIC
       109027719563
       160
       35694
       50
       2054
       216166
    
    
      SALEM
         9398429064
        72
        3582
        4
         55
        16512
    
    
      SOMERSET
        71108004864
       119
       19076
       17
        630
       133774
    
    
      SUSSEX
        22863294057
        65
        7004
        6
        144
        38997
    
    
      UNION
       116426554015
       182
       37662
       66
       2142
       227130
    
    
      WARREN
        20427091377
        69
        6255
        6
        154
        35105

Let's deconstruct this a bit, because we're chaining a lot together at once here. We start with our DataFrame object, called newframe. We then do .groupby(), passing it the name of the column we're aggregating on. As we said, this is reminiscent of SQL and returns the DataFrameGroupBy object. We can then run .sum() on the GroupBy object to add up the numeric columns for each county, returning a new DataFrame, which looks like the kind of Pivot Table we'd make in Excel.

Say we want to know which county had the most people killed in accidents over this time period. We can slice out just the 'Total Killed' column and sort it from most to least.



In [23]:

    
newframe.groupby('County Name').sum().iloc[:,1].order(ascending=False)









    Out[23]:





County Name
MIDDLESEX      308
OCEAN          284
BURLINGTON     249
ESSEX          247
CAMDEN         230
MONMOUTH       230
ATLANTIC       187
UNION          182
BERGEN         166
PASSAIC        160
GLOUCESTER     156
CUMBERLAND     143
MERCER         141
MORRIS         128
HUDSON         122
SOMERSET       119
SALEM           72
WARREN          69
SUSSEX          65
HUNTERDON       55
CAPE MAY        48
Name: Total Killed, dtype: int64

We select the 'Total Killed' column here using .iloc[], which wants a two-item list. The first item is a slice of the rows. Since we want all rows, we'll use the standard Python slicing notation of an empty colon. The second item in the list is the columns we want, which is just column 2, or if you're counting from 0, as Python does, column 1.

Finally, the .order() function will sort the resulting series and pass it the ascending keyword with a value of False (it defaults to True) to get the largest figures at the top of the list.

We can assign that to a new dataframe variable that we'll use later to make a bar chart and move on for now.



In [24]:

    
countydeaths = newframe.groupby('County Name').sum().iloc[:,1].order(ascending=False)

Now we want to aggregate the accidents by date. But you may have noticed that the dates here are strings and not actual Python date objects. This means when you try to sort them, you get 01/01/2008, 01/01/2009, 01/01/2010, and so on. We need convert the strings to actual Python dates.



In [25]:

    
#we use lambda, which creates an unnamed, one-line function and applies it to each item in the column
newframe['Crash Date']=newframe['Crash Date'].apply(lambda x: datetime.strptime(x, "%m/%d/%Y").date())

What did we do here? We take the 'Crash Date' column and use the .apply() function to perform some operation across the value in every row in that column. Then we use Python's lambda, a way to write an unnamed, one-line function. If the variable for the date string in each row of the 'Crash Date' column is x, apply and lambda perform the .strptime() function from Python's datetime module. The part in double quotes "%m/%d/%Y" says to take a string that is a two-digit month and a two-digit date and a four-digit year, divided by slashes, and convert it to a datetime object. Finally, .date() takes just the date part of that object and returns that into the row. That's a mouthful.

Now that we've done that, we can aggregate the crashes by real, sortable dates and count how many accidents happened on each date. Then we'll assign this to a new variable for the resulting DataFrame that we'll use later to make a line chart.



In [26]:

    
crashesbydate = newframe.groupby('Crash Date').count().iloc[:,0]

So far we've just been transforming and tweaking data that already exists in our dataframe. We can also calculate new data and add it as a column. Here we can create a column for the total number of people involved in an accident by adding together the 'Total Killed' and 'Total Injured' columns. This syntax works similarly to creating a new key in a Python dict.



In [27]:

    
newframe['Total Involved']=newframe['Total Killed']+newframe['Total Injured']

With this 'Total Involved' value, we can create a histogram later.



In [28]:

    
newframe['Crash Day Of Week']=newframe['Crash Day Of Week'].map(str.strip)

Here, we're using .map() to call the pandas built-in strip method on its string method across the column, which is a Series object

Finally, let's create a DataFrame of all the columns aggregated by county so we can make a scatter plot of the counties by total killed and pedestrians killed.



In [29]:

    
countyframe = newframe.groupby('County Name').sum()

With these new DataFrames, we have several slices of our data. Let's see what it looks like by visualizing it.

Basic plots with matplotlib

First, we need to launch the pylab mode inline to get the plots to render inside our notebook. You can do this at startup by running ipython notebook --pylab=inline, or you can do it like this with the magic %.



In [30]:

    
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib






    



WARNING: pylab import has clobbered these variables: ['datetime']
`%matplotlib` prevents importing * from pylab and numpy

Next, we import matplotlib's pyplot class and call it plt.



In [31]:

    
import matplotlib.pyplot as plt

If we want to make a bar chart of the number of deaths by county, from the countydeaths DataFrame we created above, we can use the .plot() function and pass it the kind='bar' keyword argument.

These plots are basically pandas' wrapper around the matplotlib library. There are many other things you can do with the wrapper and matplotlib itself, and it's worth giving the documentation a read.



In [32]:

    
countydeaths.plot(kind='bar')









    Out[32]:





<matplotlib.axes.AxesSubplot at 0x115da9590>

If we want to do a line graph, we can use the .plot() function on a DataFrame. The default kind is a line graph. So here's a graph of crashes by date.



In [33]:

    
crashesbydate.plot()









    Out[33]:





<matplotlib.axes.AxesSubplot at 0x115d97f50>

Say you want to look at the distribution of the number of people injured or killed in accidents. You can run the ```.hist()``` function on a column in the DataFrame and see that in the vast majority of accidents, no one was hurt.



In [34]:

    
newframe['Total Involved'].hist(bins=50)









    Out[34]:





<matplotlib.axes.AxesSubplot at 0x115ec28d0>

Finally, for now, we can scatter plot our county data by the total number of people killed versus the number of pedestrians killed.



In [35]:

    
#let's scatter plot county name by total killed and pedestrians killed.
countyframe.plot(kind='scatter', x='Total Killed', y='Pedestrians Killed')









    Out[35]:





<matplotlib.axes.AxesSubplot at 0x11630d350>

I hope this was helpful. If you have questions or spot bugs, you can find me on Github, via email or on Twitter.

	case code	County Name	Municipality Name	Crash Date	Crash Day Of Week	Crash Time	Police Dept Code	Police Department	Police Station	...	Posted Speed	Posted Speed Cross Street	Latitude	Longitude	Cell Phone In Use Flag	Other Property Damage	Reporting Badge No.
0	2008010108-026816	ATLANTIC	ABSECON CITY	03/04/2008	TU	1539	01	ATLANTIC CITY	AIU	...	50		39.41158	74.49162	N	NONE ...	384
1	2008010108-163190	ATLANTIC	ABSECON CITY	12/19/2008	F	1114	01	ATLANTIC CITY	TRAFFIC	...	50		39.39231	74.48952	N	NONE ...	739
2	2008010108-24779	ATLANTIC	ABSECON CITY	11/25/2008	TU	0345	99	NJ TRANSIT P.D.	ATLANTIC CITY	...	10	25			N	? ...	0053
3	2008010108-3901	ATLANTIC	ABSECON CITY	03/31/2008	M	0105	01	EAST WINDSOR	TRAFFIC UNIT	...	0				N	NONE ...	551
4	2008010108-5016	ATLANTIC	ABSECON CITY	01/25/2008	F	0942	01	EGG HARBOR TWP	HQ	...	50	40	39.43036	74.52469	N	NONE ...	1571

	Total Killed	Total Injured	Pedestrians Killed	Pedestrians Injured	Total Vehicles Involved	Road System	Posted Speed
count	1707841.000000	1707841.000000	1707841.000000	1707841.000000	1707841.000000	1707841.000000	1707840.000000
mean	0.001968	0.308403	0.000489	0.016251	1.878721	5.191971	31.205212
std	0.047111	0.698354	0.022238	0.130033	0.538178	2.499018	17.872276
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	0.000000	0.000000	0.000000	0.000000	2.000000	2.000000	25.000000
50%	0.000000	0.000000	0.000000	0.000000	2.000000	5.000000	30.000000
75%	0.000000	0.000000	0.000000	0.000000	2.000000	7.000000	45.000000
max	5.000000	42.000000	2.000000	10.000000	20.000000	10.000000	99.000000

	case code	County Name	Municipality Name	Crash Date	Crash Day Of Week	Crash Time	Police Dept Code	Police Department	...	Posted Speed	Posted Speed Cross Street	Latitude	Longitude	Cell Phone In Use Flag	Other Property Damage	Reporting Badge No.
242727	2008160108-0002	PASSAIC	BLOOMINGDALE BORO	01/01/2008	TU	0052	01	BLOOMINGDALE	...	35	35			N	SOUTHBOUND GUARDRAIL ...	5
242728	2008160108-0083	PASSAIC	BLOOMINGDALE BORO	01/09/2008	W	0550	01	BLOOMINGDALE	...	0				N	NONE ...	11
242729	2008160108-0155	PASSAIC	BLOOMINGDALE BORO	01/17/2008	TH	0127	01	BLOOMINGDALE	...	40	25	41.01790	74.31252	N	UTILITY POLE # BMP560B MAILBOX FENCE CEMENT RE...	11
242730	2008160108-0159	PASSAIC	BLOOMINGDALE BORO	01/17/2008	TH	0909	01	BLOOMINGDALE	...	35	25	41.01119	74.32027	N	NONE ...	10
242731	2008160108-0166	PASSAIC	BLOOMINGDALE BORO	01/17/2008	TH	1814	01	BLOOMINGDALE	...	35	25	41.03530	74.34463	N	NONE ...	11

	Police Dept Code	Police Department
311564	99	PALISADES INTER. PARKWAY
311565	99	PALISADES INTER. PARKWAY
311566	99	PALISADES INTER. PARKWAY
311567	99	PALISADES INTER. PARKWAY
311568	99	PALISADES INTER. PARKWAY
311569	99	PALISADES INTER. PARKWAY
311570	99	PALISADES INTER. PARKWAY
311571	99	PALISADES INTER. PARKWAY
311572	99	PALISADES INTER. PARKWAY
311573	99	PALISADES INTER. PARKWAY
311574	99	PALISADES INTER. PARKWAY
311575	99	PALISADES INTER. PARKWAY
311709	99	NEW JERSEY TRANSIT POLICE
312227	99	NEW JERSEY TRANSIT POLICE
312415	99	NEW JERSEY TRANSIT POLICE
312641	99	NEW JERSEY TRANSIT POLICE
313347	99	NEW JERSEY TRANSIT POLICE
313348	99	NEW JERSEY TRANSIT POLICE
316175	99	NEW JERSEY TRANSIT POLICE
316534	99	PALISADES INTER. PARKWAY
316535	99	PALISADES INTER. PARKWAY
320064	99	WILLIAM PATERSON UNIVERSI
320067	99	WILLIAM PATERSON UNIVERSI
320068	99	WILLIAM PATERSON UNIVERSI
320102	99	WILLIAM PATERSON UNIVERSI
320103	99	WILLIAM PATERSON UNIVERSI
320104	99	WILLIAM PATERSON UNIVERSI
320105	99	WILLIAM PATERSON UNIVERSI
320106	99	WILLIAM PATERSON UNIVERSI
320107	99	WILLIAM PATERSON UNIVERSI
...	...	...
1701415	99	KEAN UNIVERSITY PD
1701416	99	KEAN UNIVERSITY PD
1701417	99	KEAN UNIVERSITY PD
1701418	99	KEAN UNIVERSITY PD
1701419	99	KEAN UNIVERSITY PD
1701420	99	KEAN UNIVERSITY PD
1701421	99	KEAN UNIVERSITY PD
1701422	99	KEAN UNIVERSITY PD
1701423	99	KEAN UNIVERSITY PD
1701424	99	KEAN UNIVERSITY PD
1701425	99	KEAN UNIVERSITY PD
1701426	99	KEAN UNIVERSITY PD
1701427	99	KEAN UNIVERSITY PD
1701428	99	KEAN UNIVERSITY PD
1701429	99	KEAN UNIVERSITY PD
1701430	99	KEAN UNIVERSITY PD
1701431	99	KEAN UNIVERSITY PD
1701432	99	KEAN UNIVERSITY PD
1701433	99	KEAN UNIVERSITY PD
1701434	99	KEAN UNIVERSITY PD
1701435	99	KEAN UNIVERSITY PD
1701436	99	KEAN UNIVERSITY PD
1701437	99	KEAN UNIVERSITY PD
1701438	99	KEAN UNIVERSITY PD
1701439	99	KEAN UNIVERSITY PD
1701440	99	KEAN UNIVERSITY PD
1701441	99	KEAN UNIVERSITY PD
1701442	99	KEAN UNIVERSITY PD
1701443	99	KEAN UNIVERSITY PD
1701444	99	KEAN UNIVERSITY PD

	Unnamed: 0	Total Killed	Total Injured	Pedestrians Killed	Pedestrians Injured	Total Vehicles Involved
County Name
ATLANTIC	40555622700	187	20957	33	1104	98713
BERGEN	136762281964	166	47575	70	2879	333025
BURLINGTON	56830343320	249	23727	44	634	128223
CAMDEN	70636041684	230	36056	62	1539	178112
CAPE MAY	15954173080	48	6114	7	258	35792
CUMBERLAND	21914587233	143	11086	21	346	49772
ESSEX	128406345105	247	58229	95	5224	315666
GLOUCESTER	35575203872	156	15964	30	475	81242
HUDSON	98932551977	122	29470	44	4026	230722
HUNTERDON	19987461551	55	5835	10	95	40153
MERCER	70130920959	141	22701	39	995	153293
MIDDLESEX	154741088588	308	52131	88	2055	336919
MONMOUTH	111922290674	230	36027	52	1239	231862
MORRIS	64443161991	128	21039	26	658	150796
OCEAN	82316419082	284	30520	65	1048	176583
PASSAIC	109027719563	160	35694	50	2054	216166
SALEM	9398429064	72	3582	4	55	16512
SOMERSET	71108004864	119	19076	17	630	133774
SUSSEX	22863294057	65	7004	6	144	38997
UNION	116426554015	182	37662	66	2142	227130
WARREN	20427091377	69	6255	6	154	35105